本文提出了一种新颖的地理跟踪方法,即通过在室外环境中进行连续的度量自我定位,通过注册车辆的传感器信息,以看不见的目标区域的空中图像。地理跟踪方法为取代全球导航卫星系统(GNSS)的嘈杂信号提供了潜力,并且昂贵且难以维护通常用于此目的的先前地图。所提出的地理跟踪方法将来自板载摄像机和LiDAR传感器的数据与地理注册的正射击对准,以连续定位车辆。我们在公制学习环境中训练模型,以从地面和空中图像中提取视觉特征。地面特征通过激光雷达点投影到自上而下的视角,并与空中特征相匹配,以确定车辆和正射击之间的相对姿势。我们的方法是第一个在端到端可区分模型中使用板载摄像机在看不见的正射击上进行度量自定位。它表现出强烈的概括,对环境的变化是强大的,并且只需要地理姿势作为地面真理。我们在Kitti-360数据集上评估我们的方法,并达到平均绝对位置误差(APE)为0.94m。我们进一步与Kitti Odometry数据集的先前方法进行了比较,并在地理跟踪任务上实现了最新结果。
translated by 谷歌翻译
尽管当前的交互式视频对象细分方法(IVO)依靠基于涂鸦的交互来生成精确的对象掩码,但我们提出了一个基于点击的交互式视频对象细分(CIVOS)框架,以尽可能简化所需的用户工作负载。 CIVOS建立在反映用户互动和掩盖传播的DE耦合模块的基础上。交互模块将基于点击的交互转换为对象掩码,然后通过传播模块推断为其余帧。其他用户交互允许对象蒙版进行改进。该方法对流行的交互式〜戴维斯数据集进行了广泛的评估,但不可避免地适应了基于点击的基于点击的相互作用。我们考虑了在评估过程中生成点击的几种策略,以反映各种用户输入,并调整戴维斯性能指标以执行与硬件无关的比较。提出的CIVOS管道取得了竞争成果,尽管需要较低的用户工作量。
translated by 谷歌翻译
语义分割模型需要大量的手工标记培训数据,这是昂贵且耗时的生产。为此目的,我们提供了一种标签融合框架,其能够以无监督的方式改进视频序列的语义像素标签。我们利用环境的3D网格表示,并使不同帧的预测融合到使用语义网格纹理的一致表示。使用原始内在和外部摄像机参数渲染语义网格产生一组改进的语义分段图像。由于我们优化的CUDA实施,我们能够以不确定性意识的方式利用以C $课程超过$ C $ -dimensional概率分布。我们在Scannet DataSet上评估我们的方法,在那里我们从52.05美元到58.25美元的$ 52.05 \%$ 52.05 \%$ 58.25 \%$ 58.25 \%$ 58.25 \%$ 58.25 \%$ 58.25 \%$ 58.25 \%$ 58.25 \%$ 58.25 \%的注释。我们在线发布我们框架的源代码,以促进该区域的未来研究(\ url {https://github.com/fferflo/semantic-meshes})。据我们所知,这是基于具有语义纹理的网格的网格的第一个公开的标签融合框架。
translated by 谷歌翻译
Logic Mill is a scalable and openly accessible software system that identifies semantically similar documents within either one domain-specific corpus or multi-domain corpora. It uses advanced Natural Language Processing (NLP) techniques to generate numerical representations of documents. Currently it leverages a large pre-trained language model to generate these document representations. The system focuses on scientific publications and patent documents and contains more than 200 million documents. It is easily accessible via a simple Application Programming Interface (API) or via a web interface. Moreover, it is continuously being updated and can be extended to text corpora from other domains. We see this system as a general-purpose tool for future research applications in the social sciences and other domains.
translated by 谷歌翻译
The analysis of network structure is essential to many scientific areas, ranging from biology to sociology. As the computational task of clustering these networks into partitions, i.e., solving the community detection problem, is generally NP-hard, heuristic solutions are indispensable. The exploration of expedient heuristics has led to the development of particularly promising approaches in the emerging technology of quantum computing. Motivated by the substantial hardware demands for all established quantum community detection approaches, we introduce a novel QUBO based approach that only needs number-of-nodes many qubits and is represented by a QUBO-matrix as sparse as the input graph's adjacency matrix. The substantial improvement on the sparsity of the QUBO-matrix, which is typically very dense in related work, is achieved through the novel concept of separation-nodes. Instead of assigning every node to a community directly, this approach relies on the identification of a separation-node set, which -- upon its removal from the graph -- yields a set of connected components, representing the core components of the communities. Employing a greedy heuristic to assign the nodes from the separation-node sets to the identified community cores, subsequent experimental results yield a proof of concept. This work hence displays a promising approach to NISQ ready quantum community detection, catalyzing the application of quantum computers for the network structure analysis of large scale, real world problem instances.
translated by 谷歌翻译
The following article presents a memetic algorithm with applying deep reinforcement learning (DRL) for solving practically oriented dual resource constrained flexible job shop scheduling problems (DRC-FJSSP). In recent years, there has been extensive research on DRL techniques, but without considering realistic, flexible and human-centered shopfloors. A research gap can be identified in the context of make-to-order oriented discontinuous manufacturing as it is often represented in medium-size companies with high service levels. From practical industry projects in this domain, we recognize requirements to depict flexible machines, human workers and capabilities, setup and processing operations, material arrival times, complex job paths with parallel tasks for bill of material (BOM) manufacturing, sequence-depended setup times and (partially) automated tasks. On the other hand, intensive research has been done on metaheuristics in the context of DRC-FJSSP. However, there is a lack of suitable and generic scheduling methods that can be holistically applied in sociotechnical production and assembly processes. In this paper, we first formulate an extended DRC-FJSSP induced by the practical requirements mentioned. Then we present our proposed hybrid framework with parallel computing for multicriteria optimization. Through numerical experiments with real-world data, we confirm that the framework generates feasible schedules efficiently and reliably. Utilizing DRL instead of random operations leads to better results and outperforms traditional approaches.
translated by 谷歌翻译
The acquisition of high-quality human annotations through crowdsourcing platforms like Amazon Mechanical Turk (MTurk) is more challenging than expected. The annotation quality might be affected by various aspects like annotation instructions, Human Intelligence Task (HIT) design, and wages paid to annotators, etc. To avoid potentially low-quality annotations which could mislead the evaluation of automatic summarization system outputs, we investigate the recruitment of high-quality MTurk workers via a three-step qualification pipeline. We show that we can successfully filter out bad workers before they carry out the evaluations and obtain high-quality annotations while optimizing the use of resources. This paper can serve as basis for the recruitment of qualified annotators in other challenging annotation tasks.
translated by 谷歌翻译
We present NusaCrowd, a collaborative initiative to collect and unite existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have has brought together 137 datasets and 117 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their effectiveness has been demonstrated in multiple experiments. NusaCrowd's data collection enables the creation of the first zero-shot benchmarks for natural language understanding and generation in Indonesian and its local languages. Furthermore, NusaCrowd brings the creation of the first multilingual automatic speech recognition benchmark in Indonesian and its local languages. Our work is intended to help advance natural language processing research in under-represented languages.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
State-of-the-art language models are often accurate on many question-answering benchmarks with well-defined questions. Yet, in real settings questions are often unanswerable without asking the user for clarifying information. We show that current SotA models often do not ask the user for clarification when presented with imprecise questions and instead provide incorrect answers or "hallucinate". To address this, we introduce CLAM, a framework that first uses the model to detect ambiguous questions, and if an ambiguous question is detected, prompts the model to ask the user for clarification. Furthermore, we show how to construct a scalable and cost-effective automatic evaluation protocol using an oracle language model with privileged information to provide clarifying information. We show that our method achieves a 20.15 percentage point accuracy improvement over SotA on a novel ambiguous question-answering answering data set derived from TriviaQA.
translated by 谷歌翻译